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Abstract 



In this paper linear canonical correlation analysis (LCCA) is generalized by applying a structured 
P_l transform to the joint probability distribution of the considered pair of random vectors, i.e., a trans- 

formation of the joint probability measure defined on their joint observation space. This framework, 
called measure transformed canonical correlation analysis (MTCCA), applies LCCA to the data after 

-t— > 

transformation of the joint probability measure. We show that judicious choice of the transform leads to 
a modified canonical correlation analysis, which, in contrast to LCCA, is capable of detecting non-linear 
relationships between the considered pair of random vectors. Unlike kernel canonical correlation analysis, 
where the transformation is applied to the random vectors, in MTCCA the transformation is applied to 

m 

V.Q their joint probability distribution. This results in performance advantages and reduced implementation 

complexity. The proposed approach is illustrated for graphical model selection in simulated data having 
non-linear dependencies, and for measuring long-term associations between companies traded in the 

i— I 

NASDAQ and NYSE stock markets. 



Index Terms 

Association analysis, canonical correlation analysis, graphical model selection, multivariate data 
analysis, probability measure transform. 



I. Introduction 

Linear canonical correlation analysis (LCCA) [1] is a technique for multivariate data analysis and 
dimensionality reduction, which quantifies the linear associations between a pair of random vectors. In 
particular, LCCA generates a sequence of pairwise unit variance linear combinations of the considered 
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random vectors, such that the Pearson conelation coefficient between the elements of each pair is maximal, 
and each pair is uncorrected with its predecessors. The coefficients of these linear combinations, called the 
linear canonical directions, give insight into the underlying relationships between the random vectors. They 
are easily obtained by solving a simple generalized eigenvalue decomposition (GEVD) problem, which 
only involves the covariance and cross-covariance matrices of the considered random vectors. LCCA has 
been applied to blind source separation [3], image set matching [ ], direction-of-arrival estimation [5], 
[6], data fusion and group inference in medical imaging data [ ], localization of visual events associated 
with sound sources [ ], audio-video synchronization [ ], undersea target classification [ ] among others. 

The Pearson correlation coefficient is only sensitive to linear associations between random variables. 
Therefore, in cases where the considered random vectors are statistically dependent yet uncorrected, 
LCCA is not an informative tool. 

In order to overcome the linear dependence limitation several generalizations of LCCA have been 
proposed in the literature. In [ ] an information-theoretic approach to canonical correlation analysis, 
called ICCA, was proposed. This method generates a sequence pairwise unit variance linear combinations 
of the considered random vectors, such that the mutual-information (MI) [ ] between the elements of 
each pair is maximal, and each pair is uncorrected with its predecessors. Since the MI is a general 
measure of statistical dependence, which is sensitive to non-linear relationships, the ICCA [11] is capable 
of capturing pairs of linear combinations exhibiting non-linear dependence. However, in contrast to LCCA, 
the ICCA does not reduce to a simple GEVD problem. Indeed, in [11] each pair of linear combinations 
must be obtained separately via an iterative Newton-Raphson [ ! 3] algorithm, which may converge to 
undesired local maxima. Moreover each step of the Newton-Raphson algorithm involves re-estimation of 
the MI in a non-parametric manner at a potentially high computational cost. 

Another approach to non-linear generalization of LCCA is kernel canonical correlation analysis (KCCA) 
[14] -[16]. KCCA applies LCCA to high-dimensional non-linear transformations of the considered random 
vectors that map them into some reproducing kernel Hilbert spaces. Although the KCCA approach can be 
successful in extracting non-linear relations [16], [17]- [19], it suffers from the following drawbacks. First, 
the high-dimensional mappings may have high computational complexity. Second, the method is highly 
prone to over-fitting errors, and requires regularization of the covariance matrices of the transformed 
random vectors to increase numerical stability. Finally, the non-linear mappings of the random vectors 
may mask the dependencies between their original coordinates. 

In this paper we propose a different non-linear generalization of LCCA called measure transformed 
canonical correlation analysis (MTCCA). We apply a structured transform to the joint probability distri- 
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bution of the considered pair of random vectors, i.e., a transformation of the joint probability measure 
defined on their joint observation space. The proposed transform is structured by a pair of non-negative 
functions called the MT-f unctions. It preserves statistical independence and maps the joint probability 
distribution into a set of probability measures on the joint observation space. By modifying the MT- 
functions classes of measure transformations can be obtained that have different properties. Two types of 
MT-f unctions, the exponential and the Gaussian, are developed in this paper. The former has a translation 
invariance property while the latter has a localization property. 

MTCCA applies LCCA to the considered pair of random vectors under the proposed probability 
measure transform. By modifying the MT-functions the correlation coefficient under the transformed 
probability measure, called the MT-correlation coefficient, is modified, resulting in a new general frame- 
work for canonical correlation analysis. In MTCCA, the MT-correlation coefficients between the elements 
of each generated pair of linear combinations are called the MT-canonical correlation coefficients. 

The MT-functions are selected from exponential and Gaussian families of functions parameterized by 
scale and location parameters. Under these function classes it is shown that pairs of linear combinations 
with non-linear dependence can be detected by MTCCA. The parameters of the MT-functions are selected 
via maximization of a lower bound on the largest MT-canonical correlation coefficient. We show that, 
for these selected parameters, the corresponding largest MT-canonical correlation coefficient constitutes 
a measure for statistical independence under the original probability distribution. In this case it is also 
shown that the considered random vectors are statistically independent under both transformed and original 
probability distributions if and only if they are uncorrelated under the transformed probability distribution. 

In the paper an empirical implementation of MTCCA is proposed that uses strongly consistent esti- 
mators of the measure transformed covariance and cross-covariance matrices of the considered random 
vectors. 

The MTCCA approach has the following advantages over LCCA, ICCA, and the KCCA discussed 
above: 1) In contrast to LCCA, MTCCA is capable of detecting non-linear dependencies. Moreover, 
under appropriate selection of the MT-functions, the largest MT-canonical correlation coefficient is a 
measure of statistical independence between the considered random vectors. 2) In comparison to the ICCA, 
MTCCA is easier to implement from the following reasons. First, it reduces to a simple GEVD problem, 
which only involves the measure transformed covariance and cross-covariance matrices of the considered 
random vectors. Second, while MTCCA with exponential and Gaussian MT-functions involves a single 
maximization for choosing the MT-functions parameters, the ICCA involves a sequence of maximization 
problems, each having the same dimensionality as in MTCCA. 3) In the paper we show that unlike 
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the empirical ICCA and KCCA, the computational complexity of the empirical MTCCA is linear in the 
sample size which makes it favorable in large sample size scenarios. 4) Unlike KCCA, MTCCA does 
not expand the dimensions of the random vectors, nor does it require regularization of their measure 
transformed covariance matrices. 5) Finally, unlike KCCA, in MTCCA the original coordinates of the 
observation vectors are retained after the probability measure transform. Therefore, MTCCA can be easily 
applied to variable selection [20] by discarding a subset of the variables for which the corresponding 
entries of the measure transformed canonical directions are practically zero. 

The proposed approach is illustrated for two applications. The first is a simulation of graphical models 
with known dependency structure. In this simulated example we show that in similar to ICCA, the 
MTCCA outperforms the LCCA in selecting valid linear/non-linear graphical model topology. The second 
application is construction of networks that analyze long-term associations between companies traded in 
the NASDAQ and NYSE stock markets. We show that MTCCA and KCCA better associate companies 
in the same sector (technology, pharmaceutical, financial) than does LCCA and ICCA. Furthermore, 
MTCCA is able to achieve this by finding strong non-linear dependencies between the daily log-returns 
of these companies. 

The paper is organized as follows. In Section II, LCCA is reviewed. In Section III, LCCA is generalized 
by applying a transform to the joint probability distribution. Selection of the MT-functions associated 
with the transform is discussed in Section IV. In Section V, empirical implementation of MTCCA is 
obtained. In Section VI, the proposed approach is illustrated via simulation experiment. In Section VII, 
the main points of this contribution are summarized. The propositions and theorems stated throughout 
the paper are proved in the Appendix. 



A. Preliminaries 

Let X and Y denote two random vectors, whose observation spaces are given by X C MP and y C M q , 
respectively. We define the measure space (X x y,S XX y, Pxy)> where S XX y is a cr-algebra over X x y, 
and P XY is the joint probability measure on S xxy . The marginal probability measures of P XY on S x 
and S y are denoted by P x and P Y , where S x and S y are the cr-algebras over X and y, respectively. Let 
g (-, •) denote an integrable scalar function on X x y. The expectation of g (X, Y) under P XY is defined 
as 



II. Linear canonical correlation analysis: Review 




(1) 
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where and y G y. The random vectors X and Y will be said to be statistically independent under 

P XY if 

E [ 9l (X) g 2 (Y) ; P XY ] = E [ 9l (X) ; P x ] E [g 2 (Y) ; P Y ] (2) 

for all integrable scalar functions g\ (•), 32 (•) on X and respectively. The random vectors X and Y 
will be said to be uncorrelated under P XY if 

E [XY T ; P XY ] = E [X; P x ] E [Y T ; P y ] , (3) 

where (-) T denotes the transpose operator. 

B. The LCCA procedure 

LCCA generates a sequence of pairwise unit-variance linear combinations (a^X, b^Y) , k = 1, . . . , r = 
min (p, q) in the following manner. The first pair (a^X, b^Y) is determined by maximizing the Pearson 
correlation coefficient between a T X and b T Y over aeP and b £ R ! with the constraint that both 
a T X and b T Y have unit variance. Similarly, the k-th pair (aJX, b^Y) (1 < k < r) is determined by 
maximizing the Pearson correlation coefficient between a T X and b T Y over aGf and b£l' with the 
constraints that both a T X and b T Y have unit variance and (a X, b Y) are uncorrelated with all the 
previously obtained pairs (a^X, b^Y) , I = 1, . . . , k— 1. The pairs (a^, b^) and (a^X, b^Y) are called 
the A;-th order linear canonical directions and the k-th order linear canonical variates, respectively. 
The Pearson correlation coefficient between a^X and b^Y is called the k-th order linear canonical 
correlation coefficient. 

The Pearson correlation coefficient between a T X and b T Y under P XY is given by 

_ r T-y - lT-u p i a Cov[a r X,b r Y;P XY ] a r £ XY b 
Corr a X, b Y; P XY = — . — . = — . , (4) 

L J ^Var [a^X; P x ] ^Var [b^ Y; P Y ] ^%^^K^b 

where Var [•; P x ] and Cov [-, •; P XY ] denote the variance and covariance under P x and P XY , respectively. 
The last equality in (4) can be easily verified using the basic definitions of variance and covariance, 
where S x G Rp x p, £ y G R« x « and S XY G W xq denote the covariance matrix of X under P x , the 
covariance matrix of Y under P Y , and their cross-covariance matrix under P XY , respectively, and it is 
assumed that S x and X! Y are non-singular. 

Hence, LCCA solves the following constraint maximization sequentially over k = 1 , . . . , r. 

p fc (5] x ,X; Y ,£ XY ) = maxa T 2] XY b, (5) 

a,b 

s.t. a T S x a = b T S Y b = 1, 

and a T X! XY b; = b T £ XY a; = a T X! x a; = b T £! Y bz = V 1 < I < k, 
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where pk (S x , E Y , 52 XY ) denotes the /c-th order linear canonical correlation coefficient. Since the number 
of constraints in (5) increases with k, it is implied that the linear canonical correlation coefficients satisfy 
the following order relation 1 > p\ (S x , £ Y , S XY ) > ■ ■ ■ > p r (S x , S Y , S XY ) > 0. 

It is well known that the constrained maximization problem in (5) reduces to the set of r distinct 
solutions of the following generalized eigenvalue problem [21] 







-iT 



S XY 
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S Y 




b 



(6) 



where p = p^ (S x , £ Y , S XY ) is the A>fh largest generalized eigenvalue of the pencil in (6), and 
[a T ,b T ] T = [ajT,bjT] T is its corresponding generalized eigenvector. 



III. Measure transformed canonical correlation analysis 

In this section LCCA is generalized by applying a transform to the joint probability measure P XY . 
First, a transform which maps P XY into a set of joint probability measures jc}^^ j on S xxy is derived 
that have the property that they preserve statistical independence of X and Y under P XY . The MTCCA 



method is obtained by applying LCCA to X and Y under the transformed probability measure Q 



(u,v) 



A. Transformation of the joint probability measure -P XY 

Definition 1. Given two non-negative functions u : W — > R and v : M q — > R satisfying 

< E [u (X) v (Y) ; P XY ] < oo, 
a transform on the joint probability measure P XY is defined via the following relation 

Qxy ) (A) = T u ,v [Pxv] (A) = J ip u , v (x, y) dP XY (x, y) , 

A 

where A £ S XX y, x £ X, y G y, and 

( s a u (x) v (y) 
^ X ' yj -EMXMY);P XY r 

The functions u (•) and v (•), associated with the transform T uv [■], are called the MT-functions. 

In the following Proposition, some properties of the measure transform (8) are given. 
Proposition 1. Let Qyffl be defined by relation (8). Then 



(7) 



(8) 



(9) 



1) Qxy^ iJ a probability measure on S, 



xxy- 
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2) Qxy^ iJ absolutely continuous w.r.t. P XY , wft/i Radon-Nikodym derivative [33] given by 

(p U)V (x,y). (10) 



dQ^ ] (x,y) 



dP XY (x, y) 

3) IfX. and Y are statistically independent under P XY , then they are statistically independent under 

n (u,v) 
Vxy • 

4) Assume that the MT-functions u (•) and v (•) are strictly positive. If X ana* Y are statistically 
independent under Qxy > are statistically independent under P XY . 

fA proof is given in Appendix A ] 

By modifying the MT-functions u(-) and f (•), such that the conditions in Definition 1 are satisfied, 
an infinite set of joint probability measures on S XX y can be obtained. 



B. The MTCCA procedure 

MTCCA generates a sequence of pairwise linear combinations (a^X, b^Yj, k = 1, . . . , r = min (p, q) 
that have the following properties under the transformed probability measure Qxy"^ and b^Y 

have unit variance, the correlation coefficient between a^X and b^Y is maximal, and (a|X, b^Y) 
are uncorrelated with (afX, bfY) for all 1 < I < k. In MTCCA, the pairs (a fc ,b fc ) and (a^X,b^Y) 
are called the A;-th order MT-canonical directions and the A;-th order MT-canonical variates, respectively. 
The correlation coefficient between a k X and b£ Y under Q XY is called the k-th order MT-canonical 
correlation coefficient. 

The correlation coefficient between a T X and b T Y under Q£y i s given by 



Corr 



a^Xb^Q^ 



Cov 



a T X, b T Y; Q 



(u,v) 



a T S 



Var 



a T X; Qx 



(u,v) 



Var 



b T Y-Q^' v) 



(11) 



where Corr 



is called the MT-correlation coefficient, and the measures 

Qt v) and Q^'"> are 



the marginal probability measures of Q^-y^ on S x and S y , respectively. The matrices Sx'°', S Y '"' and 



(u,v) -^(u,v) 



denote the covariance matrix of X under Q^ ,v \ the covariance matrix of Y under Q Y and 



(u,v) 



their cross-covariance matrix under Qxy ? respectively, where it is assumed that I] x J ' u; and E Y 
non-singular. 



(u,v) 



are 



Using (1) and (10) it can be shown that E G(X,Y);Q 



E[G (X,Y)<^ (X,Y);P XY ], 



where G (X, Y) is some arbitrary matrix function of X and Y. Therefore, one can easily verify that 



,(u,v) 



E [XX T Vv (X, Y) ; P XY ] - E [X^,„ (X, Y) ; P XY ] E [X T ^ u , t) (X, Y) ; P : 



XY ) 



(12) 
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= E [YY T ft) „ (X, Y) ; P XY ] - E [Yc^ (X, Y) ; P XY ] E [Y T ip u>v (X, Y) ; P x 



(13) 



and 



E [XY T ft ,„ (X, Y) ; P XY ] - E [X<p u>v (X, Y) ; P XY ] E [Y T (X, Y) ; P x 



(14) 



(u,v) 



Equations (12)-(14) imply that 5] x ' , E Y ' and S XY are weighted covariance and cross-covariance 
matrices of X and Y under P XY , with weighting function ip UjV (•,•). 

MTCCA solves the following constrained maximization sequentially over k = 1, 



, r. 



Pk ^ 



(u,v) ^(u,v) ^(u,v) 



! ^XY 



max a 2j xy b, 

a.b 



(15) 



s.t. a T ^' v) a = b T S Y t ' , ' ) b = 1, 



and a^fe^b, = b T E&" )T a, = a^^a, = b T S Y t '" ) b i = V 1 < I < k, 



where p^ I X 



,(tt,-u) y,{u,v) -y,(u,v) 
, Zj y , ^J XY 



denotes the k-th order MT-canonical correlation coefficient. Since the 



number of constraints in (15) increases with k, the MT-canonical correlation coefficients satisfy the 



following order relation 1 > p\ I ~S^' V \ 5} Y ' U \ E XY ^ 

Similarly to (5) the constrained maximization problem in (15) reduces to the following generalized 
eigenvalue problem 



> 



> 



Pr 



J X 5 Y j ^XY 



> 0. 







^XY 




a 


= P 









b 










t(u,v) 





a 




b 



(16) 



where p = p^ (S x , S Y , X XY ) is the A;-th largest generalized eigenvalue of the pencil in (16), and 
[a T ,b T ] T = [ajT,bjT] T is its corresponding generalized eigenvector. 

By modifying the MT-functions u (•) and v (•), such that the condition in (7) is satisfied, the MT- 
correlation coefficient under Q XY "' ) is modified, resulting in a family of canonical correlation analyses, 
generalizing LCCA described in Subsection II-B. In particular, by choosing u (x) = 1 and v (y) = 1, 



then Qxy^ 



P XY , Corr 



a T X, b T Y; Q 



(u,v) 



Corr [a T X, b T Y; P XY ] , and the LCCA is obtained. 



Other choices of u (•) and v (•) are discussed below. 

IV. Selection of the MT-functions 

In this section we parameterize the MT-functions u (x; s) and v (y; t) with parameters s € W and 
t 6 S' under the exponential and Gaussian families of functions. This will result in the corresponding 
cross-covariance matrix £ XY ^ (£, s) gaining sensitivity to non-linear relationships between the entries of 
X and Y. Optimal choice of the parameters s and t is also discussed. 
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A. Exponential MT-functions 

Let u (•; •) and u (•; •) be denned as the parameterized functions 

u E (x; s) = exp (s T x) and v E (y; t) = exp (i T y) , (17) 

where s£l p and t G M 9 . Using (9), (14) and (17) one can easily verify that the cross-covariance matrix 
of X and Y under Q^v takes the form 

^■'M) ^"^ 1 ''' 1 , (18) 

where 

M XY (a, t) = E [exp (s T X + t T Y) ; P XY ] (19) 

is the joint moment generating function of X and Y, and it is assumed that M XY (a, t) is finite in some 
open region in MP x U. q containing the origin. Note that the cross-covariance matrix in (18) involves 
higher-order statistics of X and Y. Additionally, observe that H^' VE ^ (s,t) reduces to the standard 
cross-covariance matrix £ XY for s = and t = 0. Finally, note that the quantity in (18) has been 
proposed in [22]-[28] for blind source separation, blind channel estimation, blind channel equalization, 
and auto-regression parameter estimation. To the best of our knowledge this paper is the first to propose 
this quantity for generalizing LCCA. 

In the following Theorem, which follows directly from (18) and the properties of M XY (a, t) [29], [30], 
one sees that "E^' VE ^ (s,t) preserves statistical independence and can capture non-linear dependencies 
when they exist. 

Theorem 1. Let U denote an arbitrary open region in W x M 9 containing the origin, and assume that 
M XY (s, t) is finite on U. The random vectors X and Y are statistically independent under the joint 
probability measure P XY if and only if 

Sxy ^ e) (s, t) = V (a, t) G U. (20) 

[A proof is given in Appendix B]. 

The "if" is the interesting part of the theorem since the "only if" part follows directly from Property 3 
of Proposition 1. In particular, if X and Y are statistically dependent under P XY , then there exist a G W, 
b G M. q , s£l p and t G R q , such that a T S x J ^'' ;E) (s,t) b / 0. Thus, (11) implies that if X and Y are 
statistically dependent under P XY then there exist linear combinations of the form a T X and b T Y whose 
MT-correlation coefficient under Q^' VE ^ is non-zero. 
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Finally, we show that MTCCA with the exponential MT-f unctions in (17) is translation-invariant. Let 
X' = X+q and Y' = Y+/3, where a and (3 are deterministic vectors in W and R 9 , respectively. Accord- 
ing to (9) and (17) <p u>v (X, Y) = ip u<v (X', Y). Therefore, by (12)-(14): £^ B ) (s,t) = e£ e,Ve) (s,t), 
S^ b,Ue) (a,t) = sj 1 ,^ ( s ,t), and S^ e '° e) (a,t) = S^ e '" e) (a,t). Thus, by (15), the MT-canonical 
correlation coefficients are invariant to translation, i.e. 

Pk ( fl , t) , (a, t) , (s, tj) = Pk (*, t) , t) , s£^ e) (a, t)) 

for fc = 1, . . . , r. 

B. Gaussian MT-functions 

Next we define the MT-functions u {•;•,•) and «(•;■,•) by 

u G (x;s,cr) = - — ^— irexp ( ^ 2 ] and u G (y;t,r) = - — ^— T exp ( 3 2 ) ,(21) 

(27T(T 2 ) 2 V lU J (27TT 2 ) 2 \ / 

where s 6 IR P , i G IR 9 , a G IR + , r G IR + , and ||-|| 2 denotes the /2-norm. Since u G (•; •, •) and v G (•; •, •) 
are strictly positive and bounded, one can easily verify that the condition in (7) is satisfied. Relations (9) 
and (14) imply that the MT-functions (21) produce a weighted cross-covariance matrix, for which the 

1 1 2 1 1 2 

observations are weighted in an inverse proportion to the distances ||x — s\\ 2 and ||y — 1\\ 2 . Hence, the 
resulting MT-correlation coefficient is a measure of local linear dependence in the vicinity of (s,t). We 
note that local linear dependence exists whenever there are global non-linear dependencies. 

Sensitivity of Sxy'" G ' (a, t) to non-linear relationships between X and Y is shown via the following 
Theorem. 

Theorem 2. Let a, r be fixed and positive. Additionally, let U denote an arbitrary open region in 
R p x M. q containing the origin. The random vectors X and Y are statistically independent under the 
joint probability measure P X y if and only if 

£xy ' UG) (a, t) = V (a, t) G U. (22) 

[A proof is given in Appendix C]. 

Hence, if X and Y are statistically dependent under P X y> then there exist a G W, b G M. q , s G MP 
and t G M. q , such that a T H^' VG ^ (s,t) b ^ 0. Therefore, again, non-linear dependencies can be detected 
using MTCCA. 
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C. Comparison between the exponential and Gaussian MT-functions: 

Unlike MTCCA with Gaussian MT-fucntions (21), MTCCA with exponential MT-functions (17) is 
translation invariant. Moreover, in MTCCA with Gaussian MT-functions, in addition to the location 
parameters s, t, which share the same dimensionality of the scaling parameters of the exponential MT- 
functions, one has to set two width parameters a and r. On the other hand, unlike the exponential MT- 
functions, the Gaussian MT-functions are bounded in the joint observation space X x y. Hence, MTCCA 
with Gaussian MT-functions is more robust to outliers. Additionally, the Gaussian MT-functions has the 
property that they localize linear dependence over the observation space. This property is illustrated in 
Subsection VI-A. Additional common properties of the exponential and Gaussian MT-functions are given 
in the following remarks: 

Remark 1. Since the exponential and Gaussian MT-functions are strictly positive, by Property 4 of 
Proposition I we conclude that Q^' Ve ^ and Qxy' preserve statistical dependence under P X y- 

Remark 2. The exponential and Gaussian MT-functions preserve Gaussianity in the sense that if X and 
Y are jointly Gaussian under -P X y. then they are jointly Gaussian under Qxy" 15 ' and Q^-y' • 



D. Selection of the MT-functions parameters 

A natural choice of the parameters s and t, would be those that maximize the first-order MT-canonical 



correlation coefficient pi ^S^' 1 ^ (s,t) , Y1^ ,U) (s,t) , Sxy ; ( S 7*)J i n (15)- However, this maximization 
is analytically cumbersome. Therefore, as an alternative, we propose maximizing a lower bound on 



i(u,v) 



(u,v) 



(u,v) 



S l t) ! ^XY 



(s,t)\. We show that the resultant first-order MT-canonical corre- 



lation coefficient will be sensitive to dependence between X and Y. 



Proposition 2. Define the following element-by-element average: 

( 



J X l S ) l ) ) ^Y 



S l t) 1 ^XY ^ ( S l t) 



1 

pq 



P <? 



r«i(u 



^XY ^ ( S l t) 



V 



i=l j=l 



(*,t) 



J h3 



(S,t) 



3,3 , 



where [A]^ ■ denotes the i,j-th entry of A. 



1/2 



(23) 



V ( ?t v) (s, t) , £$M» (s, t) , Efe o) (s, tj) < pi (vt v) (s, t) , £^ (s, t) , (s, t) ) . (24) 



[A proof is given in Appendix D] 
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Proposition 2 suggests choosing the optimal MT-functions parameters by maximizing the lower bound 
in (24): 

( S *,t*) = arg max ^ (s^ (a,t) , (a,t) , (a,t)) , (25) 

where 1/ a closed region in IR P x W 1 containing the origin. Under the MT-functions pairs in (17) and 
(21) one can verify that V (Ex {a, t) , E^ {s, t) , Efe w) (s, t)) is continuous in M? x M«. Therefore, 
by the extreme value theorem [ ] it has a maximum in V. The maximization problem in (25) can be 
solved numerically, e.g., using gradient ascent [ ] or greedy search over the region V. 

The following theorem justifies the use of the first-order MT-canonical correlation coefficient as a 
measure of statistical independence. 

Theorem 3. The random vectors X and Y are statistically independent under P X y if and only if 

n (e^ (s*,t*),^' v) (**,**) (,*,<*)) = 0, 

where (u,v) are the MT-functions in (17) or (21), and (s*,t*) are selected according to (25). [A proof 
is given in Appendix E] 

Therefore, if the MT-functions and their parameters are selected as in Theorem 3, we conclude that 
X and Y are statistically independent under P XY if and only if they are uncorrected under Qxy ■ 
Hence, since by Property 3 of Proposition 1 Qxy preserves statistical independence under -P XY , we 
also conclude that X and Y are statistically independent under Qxy if an d onr y if tne y are uncorrected 
under Qxy^- 

V. Empirical implementation of MTCCA 

Given N i.i.d. samples of (X,Y) an empirical version of MTCCA (15) can be implemented by 
replacing Ex'"', Sy'"' and E X J Y U ' ) in (15), (16) and (25) with their sample covariance estimates. Hence, 
strongly consistent estimators of Sx'"', E-y and Exy w& constructed, based on N i.i.d. samples of 
(X,Y). 

Proposition 3. Let (X n , Y n ), n = 1, . . . , N denote a sequence of i.i.d. samples from the joint distribution 
P XY , and define the empirical covariance estimates 

1 N AT 

"(U,V) A 1 ^ rp iV „ ( U ,V)T ,n,£S 

x ~~ N — 1 n n ^ u ' v \ n ' n > ~ — 1 ' 

n=l 

^ = AE Y A (X n ,Y») - j^^W^ (27) 

n=l 
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and 



where 



and 



1 N AT 

^(u,v) A 1 T iV r (u,v) r (u,v)T 

n=l 



1 N 1 N 

Ax — ~jy X n (/? Ujl) (X n , Y n ) , Ay = — Y n (/? M)?; (X n , Y n ) , (29) 

n=l n=l 

(X n ,Y„) = . (30) 

TfEu (X n ) w (Y n ) 

n=l 



Assume 



E [« 4 (X) ; P x ] < oo, E [ W 4 (Y) ; P Y ] < oo, (31) 

E[X 4 ;P X ] <oo Vfc=l,...,p and E [rf; P Y ] < oo V/ = 1, . . . , q, (32) 

where and Y\ denote the k-th and the l-th entries of X and Y, respectively. Then S x ' ^ — > E x ' v \ 
X1 Y ' —7- S Y anc? S XY ^ — > S^y"^ almost surely as N — > oo. fA proof is given in Appendix F] 

Note that for u (X) = 1 and v (Y) = 1, the estimators S x ' , S Y ' , and S XY reduce to the standard 
unbiased estimators of the covariance and cross-co variance matrices X! x , X! Y and £) XY , respectively. 

The empirical MTCCA procedure with the exponential and Gaussian MT-functions is given in Appendix 
G. In the first stage of the procedure, the parameters of the MT-functions are selected by solving a single 
(p+ l) -dimensional maximization problem (64) using gradient ascent. It can be shown that each iteration 
of the gradient ascent algorithm, which only involves the empirical measure transformed covariance 
and cross-covariance matrices, has asymptotic computational load (ACL) of 0((p + q) 2 N) flops per 
iteration. In the second stage, the empirical MT-canonical correlation coefficients and directions are 
obtained simultaneously by solving the GEVD problem (65) with ACL of 0((p + q) 3 ) flops. Unlike 
the empirical MTCCA, the empirical ICCA [11] involves a sequence of (p + q) -dimensional numerical 
maximizations, one for each pair of canonical directions, using an iterative Newton-Raphson algorithm. 
It can be shown that each iteration of the Newton-Rafson algorithm, which involves re-estimation of 
the mutual-information in a non-parametric manner and inversion of a Hessian matrix, has ACL of 
0((p + q)N 2 + (p + q + 2k) 3 ) flops, where k denotes a canonical directions pair index. The empirical 
KCCA procedure [14]-[16], which involves computation of two N x N Gram matrices followed by 
solving a GEVD problem, has ACL of 0((p + q)N 2 + N 3 ) flops. Hence, one sees that unlike the 
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empirical ICCA and KCCA, the computational complexity of the empirical MTCCA is linear in N, 
which makes it favorable in large samples size scenarios. 

VI. Numerical examples 

In this section, we illustrate the use of empirical LCCA, ICCA, KCCA and MTCCA for graphical 
model selection. In every example below the empirical MTCCA was performed with the exponen- 
tial and Gaussian MT-functions via the procedure in Appendix G. In ICCA, the empirical mutual- 
information, Ifc, between each pair of canonical variates was mapped to the interval [0, 1] via the formula 
Pk = ^ 1 — exp(— 2 Ik) which produce the empirical informational canonical correlation coefficients. The 
empirical KCCA was performed using Gaussian radial basis function kernels. Since KCCA masks the 
original coordinates of X and Y, it is not illustrated for the graphical model selection tasks in simulation 
examples 1 and 2, which involve variable selection. In simulation examples 1 and 2, the canonical 
correlation coefficients and canonical directions were estimated using N = 1000 i.i.d. samples of X 
and Y. The statistical significance of the empirical canonical correlation coefficients was tested using 
empirical estimates of p-values associated with rejecting the null-hypothesis of no statistical dependence 
between X and Y (see Appendix H). 

A. Simulation example 1: Selection of graphical model with non-linear connections 

In this example, we consider the random vectors X = [Xi,X 2 ] T and Y = [Yi, Y 2 ] T , where 

Y\ = cos (XO + O.IW, 

and X\, Xi, Y 2 , and W are mutually independent standard normal random variables. For this example, 
the pair of linear combinations of the form (a T X, b T Y) having maximal dependency is obtained for the 
vector pair (ai = [1, 0] T , t>i = [1, 0] T ) which are identical to the true first-order MT-canonical directions. 
In this example, all pairs of linear combinations of the form a T X and b T Y have zero linear correlation 
even though they are not statistically independent. The dependencies between X and Y are depicted by 
the bipartite graphical model in Fig. 1. 

The averaged estimates of the MT, linear, and informational canonical correlation coefficients and 
their corresponding averaged p-values, based on 1000 Monte-Carlo simulations, are given in Table I. 
The sample means and standard deviations of the absolute dot products of (ai/||ai|| 2 , ai/||ai|| 2 ) and 
(t>i/||bi|| 2 , bi/||bi|| 2 ), based on 1000 Monte-Carlo simulations, are given in Table II. The absolute dot 
products should be equal to 1 when the estimated canonical directions a, b are equal to ai = [1,0] T , 
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Y, 



X 7 7, 
O O " 

Fig. 1. The graphical model of dependencies in simulation example 1. A single edge exists between X\ and Y\ due to the 
non-linear relation model Y\ = cos (X\) +0.1W\ The correlation between X\ and Y\ under Pxy is equal to zero even though 
they are dependent. 

bi = [1,0] T , respectively. One can notice that in contrast to LCCA, the MTCCA and ICCA detect the 
true dependencies between X and Y, depicted by the bipartite graphical model in Figs. 1. 

TABLE I 

Simulation example 1: The averaged estimates of the MT, linear, and informational canonical 

CORRELATION COEFFICIENTS AND THEIR CORRESPONDING AVERAGED p- VALUES (IN PARENTHESES). 





Exponential MT-functions 


Gaussian MT-functions 


LCCA 


ICCA 


h 


0.83 (0) 


0.88 (0) 


0.06 (0.37) 


0.85 (0) 


h 


0.04 (0.38) 


0.03 (0.36) 


0.01 (0.45) 


0.23 (0.42) 



TABLE II 

Simulation example 1: The sample means and standard deviations (in parenthesis) of c(ai, ai) and 

c(bi , bi), where c(u, v) 4 | ||u g^ ||a | . 





Exponential MT-functions 


Gaussian MT-function 


LCCA 


ICCA 


c(ai,ai) 


0.99 (7 ■ 10" 4 ) 


0.99 (3 • 10" 4 ) 


0.73 (0.27) 


0.99 (2 • 10" 5 ) 


c(bi, bi) 


0.99 (4 • 10" 4 ) 


0.99 (1 • 10" 4 ) 


0.75 (0.22) 


0.99 (1 • 10" 5 ) 



Scatter plots of the empirical first-order MT, linear, and informational canonical variates (af X, b^Y) 
are shown in Figs. 2(a)-2(d). Observe that unlike LCCA, MTCCA and ICCA recover the true non-linear 
relation between X and Y, which has a raised cosine shape. In these figures, we have also plotted the 
ellipses associate with the empirical covariance matrices of [a^X, b^Y] T under the probability measures 
' Ve \ Qx-y' V °\ an d ^xy, respectively. Observing Figs. 2(a) and 2(b) one can notice that the local 
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linear trend is better captured by MTCCA with Gaussian MT-functions due to their localization property, 
discussed in Subsection IV-B. 




-4 -3 -2 -1 1 2 3 4 -4 -3 -2 -1 1 2 3 4 




(c) (d) 

Fig. 2. Simulation example 1: Scatter plots of the empirical first-order canonical variates obtained by: (a) MTCCA with 
exponential MT-functions, (b) MTCCA with Gaussian MT-functions, (c) LCCA, and (d) ICCA. Note that, while the linear 
canonical variates are uninformative (circular Gaussian distributed), the MT and informational canonical variates have captured 
the non-linear structure (raised cosine shape) of the non-linear model. This occurs since all variables in example 1 have zero 
correlation but some variables are non-linearly dependent. The ellipses represent the associated covariance matrices under the 
probability measures Q^-' VE \ Q ( £^' v °\ and Pxy, respectively. 
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B. Simulation example 2: Selection of graphical model with linear and non-linear connections 

In this example, we consider a more complex model. Let the random vectors X = [X±, X2, X3, X4, X^] T 
and Y = [Yx,Y 2 ,Y 3 f satisfy 

Y 1 = X x + 0.5X 2 + 0.1Wi, 

Y 2 = cos (X 3 + 0.75X4 + 0.5X 5 ) + Q.IW2, 

where Xi, i = 1, . . . , 5, Wi, i = 1,2, and I3 are mutually independent standard normal random variables. 
In this example there exist two independent pairs of linear combinations (a?X, b^Y), k = 1,2, with 
maximal inter-dependencies. These maximally dependent canonical variates are obtained for the vector 
pairs (ai = [1, 0.5, 0, 0, 0] T , bi = [1,0, 0] T ) and (a 2 = [0, 0, 1, 0.75, 0.5] T , b 2 = [0,1, 0] T ), which are 
also the first-order and second-order MT-canonical directions. The dependencies between X and Y are 
depicted by the bipartite graphical model in Fig. 3. 




Fig. 3. The dependency graphical model corresponding to simulation example 2. There are two connected components 

{(X 1 ,Y 1 ),(X 2 ,Y 1 )} and {(X 3 ,Y 2 ), (X 4 ,Y 2 ), (X 5 ,Y 2 )}. 

The averaged estimates of the MT, linear, and informational canonical correlation coefficients and their 
corresponding averaged p-values, based on 1000 Monte-Carlo simulations, are given in Table III. The 
sample means and standard deviations of the absolute dot products of the pairs (afc/||a.fe|| 2 , a.fe/||ajfc[| 2 ) and 
(bjfe/||bjfc[| 2 , bfc/[|bfc|| 2 ), k = 1, 2, based on 1000 Monte-Carlo simulations, are given in Table IV. Observe 
that both MTCCA and ICCA detect the true dependencies between X and Y, depicted by the bipartite 
graphical model in Fig. 3. As expected, the LCCA detects only the linearly dependent combinations. 
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TABLE III 

Simulation example 2: The averaged estimates of the MT, linear, and informational canonical 

CORRELATION COEFFICIENTS AND THEIR CORRESPONDING AVERAGED p- VALUES (IN PARENTHESIS). 





Exponential MT-functions 


Gaussian MT-functions 


LCCA 


ICCA 


h 


1 (0) 


1 (0) 


1 (0) 


0.93 (0) 




0.75 (0) 


0.9 (0) 


0.08 (0.22) 


0.89 (0) 


h 


0.08 (0.2) 


0.1 (0.18) 


0.04 (0.35) 


0.24 (0.27) 



TABLE IV 

Simulation example 2: The sample means and standard deviations (in parenthesis) of c(a fc , a^) and 

c(h k , b k ), k = 1, 2, where c(u, v) 4 | "^iviu I • 





Exponential MT-functions 


Gaussian MT-functions 


LCCA 


ICCA 


c(ai,ai) 


1 (5- 10~ 5 ) 


1 (5 • 10" 4 ) 


1 (lO" 5 ) 


0.99 (7 • 10" 4 ) 


c(a 2 ,a 2 ) 


0.99 (6 • 10" 3 ) 


0.99 (8 • 10 -3 ) 


0.5 (0.28) 


0.99 (1 • 10" 3 ) 


c(bi,bi) 


1 (8 • 10 -6 ) 


1 (9 • 10~ 4 ) 


1 (2 • 10" 5 ) 


0.99 (2 • 10" 3 ) 


c(b 2 ,b 2 ) 


0.99 (2 • 10" 3 ) 


0.99 (6 • 10" 3 ) 


0.7 (0.26) 


0.99 (3 • 10" 3 ) 



C. Measuring long-term associations between NASDAQ/NYSE traded companies 

Here, MTCCA is applied to a real world example of capturing long-term associations between pairs 
of companies traded on the NASDAQ and NYSE stock markets. The compared companies were Mi- 
crosoft (MSFT), Intel (INTC), Apple (AAPL), Merck (MRK), Pfizer (PFE), Johnson and Johnson (JNJ), 
American express (AXP), JP Morgan (JPM), and Bank of America (BAC). For each pair of companies, 
we considered the random vectors X = [Xl,A"2] t and Y = [Yi,Y2] T - The variables X\ and Y± are 
the log-ratios of two consecutive daily closing prices of a stock, called log-returns. The variables X2 
and Y2 are the log-ratios of two consecutive daily trading volumes of a stock, called log-volume ratios. 
Consecutive daily measurements of X and Y from January 2, 2001 to December 31, 2010, comprising 
2514 samples, were obtained from the WRDS database [35]. 

Figs. 4(a) and 4(b) display the matrix of empirical first-order MT-canonical correlation coefficients for 
the exponential and Gaussian MT-functions, respectively. Figs. 4(c)-4(e) show the matrix of empirical 
first-order canonical correlation coefficients obtained by LCCA, ICCA and KCCA, respectively. Note that 
MTCCA and KCCA better cluster companies in similar sectors: (MSFT, INTC, AAPL) - technology, 



19 



(MRK, PFE, JNJ) - pharmaceuticals, (AXP, JPM, BAC) - financial. In this example, the p- values associated 
with all empirical first-order canonical correlation coefficients were less than 0.01. 

The empirical first-order canonical correlation coefficients were used for constructing graphical models 
in which the nodes represent the compared companies. The criterion for connecting a pair of nodes was 
set to empirical first-order canonical correlation coefficient greater than a threshold A. In Figs. 5-7 the 
graphical models selected by MTCCA with exponential MT-functions are compared to LCCA, ICCA and 
KCCA, respectively. Similarly, in Figs. 8-10 the graphical models selected by MTCCA with Gaussian 
MT-functions are compared to LCCA, ICCA and KCCA, respectively. In the first column of each figure 
we show the graphs selected by MTCCA for A = 0.5, 0.55, 0.58. In the second column we show the 
corresponding graphs selected by the other compared method by scanning A over the interval [0, 1] and 
finding the graph with minimum edit distance [36]. The symmetric difference graphs are shown in the third 
column. The red lines in the symmetric difference graphs indicate edges found by MTCCA and not by the 
other compared method, and vice-versa for the black lines. Note that for all of the threshold parameters 
A investigated, the MTTCA graph shows equal or larger number of dependencies than the closest LCCA, 
ICCA and KCCA graphs. This result suggests that MTCCA has captured more dependencies than LCCA, 
ICCA and KCCA. While there is no ground truth validation, the fact that MTCCA clusters together 
companies in similar sectors (Banking, pharmaceuticals, and technology) provides anecdotal support for 
the power and applicability of MTCCA. 

Fig. 11 depicts the distribution of the empirical MT, linear, and informational first-order canonical 

T 



directions. Let k\ = [ai,i,ai.2] T and b»i 



on the unit circle. Observe that in MTCCA (first 



and second columns) and 612 are relatively small in comparison to a\ t \ and b\^_. One can conclude 
that, unlike LCCA and ICCA, MTCCA is zeroing in on the strong non-linear dependencies between the 
daily log-returns of these companies and is de-emphasizing the daily log-volume ratios. This analysis is 
not performed for KCCA since the empirical canonical directions obtained by KCCA do not correspond 
to the original coordinates of X and Y. 

We note that in this example the difference between MTCCA and ICCA may possibly arise from the 
sensitivity of fixed kernel density estimation, preformed in ICCA, to the heavy-tailed financial data [37]. 

VII. Conclusion 

In this paper, LCCA was generalized by applying a structured transform to the joint probability distri- 
bution of X and Y. By modifying the functions associated with the transform, this generalization, called 
MTCCA, preserves independence and captures non-linear dependencies. Two classes of MTCCA were 
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MSFT INTC AAPL MRK PFE JNJ AXP JPM BAC 



MSFT INTC AAPL MRK PFE JNJ AXP JPM BAC 



(a) 
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Fig. 4. NASDAQ/NYSE experiment. Empirical first-order canonical correlation coefficients obtained by (a) MTCCA with 
exponential MT functions, (b) MTCCA with Gaussian MT-functions. (c) LCCA, (d) ICCA, and (e) KCCA. Note the three 
blocks of mutually high canonical correlations revealed by MTCCA and KCCA; MTCCA and KCCA better cluster companies 
in similar sectors: (MSFT, INTC, AAPL) - technology, (MRK, PFE, JNJ) - pharmaceuticals, (AXP, JPM, BAC) - financial. 
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Fig. 5. NASDAQ/NYSE experiment. Left column: The graphical models selected by MTCCA with exponential MT-functions 
for A = 0.5,0.55,0.58. Middle column: The closest graphs selected by LCCA. Right column: The symmetric difference 
graphs: the red lines indicate edges found by MTCCA and not by LCCA, and vice-versa for the black lines. For these values 
of A, exponential MTCCA detects more dependencies than LCCA: the MTCCA graph has more edges than the closest LCCA 
graph. 

proposed based on specification of MT-functions in the exponential and Gaussian families, respectively. 
The proposed MTCCA approach was compared to LCCA, ICCA and KCCA for graphical model selection 
in simulated data having non-linear dependencies, and for measuring long-term associations between pairs 
of companies traded on the NASDAQ and NYSE stock markets. It is likely that there exist other classes 
of MT-functions that have a similar capability to accurately detect non-linear dependencies. 

In the paper we have shown that the Hessian of the joint cumulant generating function (18) is a special 
case of measure transformed covariance matrix with exponential MT-functions. Therefore, in similar to 
the generalization proposed in this paper, the techniques in [22]-[28], which are based on Hessians of 
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Fig. 6. NASDAQ/NYSE experiment. Left column: The graphical models selected by MTCCA with exponential MT-functions 
for A = 0.5,0.55,0.58. Middle column: The closest graphs selected by ICCA. Right column: The symmetric difference 
graphs: the red lines indicate edges found by MTCCA and not by ICCA, and vice-versa for the black lines. For these values 
of A, exponential MTCCA detects more dependencies than ICCA: the MTCCA graph has more edges than the closest ICCA 
graph. 



the cumulant generating function, may also be generalized by the measure-transformation framework. 
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Appendix 

A. Proof of Proposition 1: 
1) Property 1: 

Since tp u>v (x, y) is nonnegative, then by Corollary 2.3.6 in [32] Qxy 

is a measure on o^xy- 
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Fig. 7. NASDAQ/NYSE experiment. Left column: The graphical models selected by MTCCA with exponential MT-functions 
for A = 0.5,0.55,0.58. Middle column: The closest graphs selected by KCCA. Right column: The symmetric difference 
graphs: the red lines indicate edges found by MTCCA and not by KCCA, and vice-versa for the black lines. For A = 0.58 the 
MTCCA graph has one more edge than the closest KCCA graph. 



Furthermore, Qxy (X x y) = 1 so that Qxy is a probability measure on S xxy . 

2) Property 2: 

Follows from definitions 4.1.1 and 4.1.3 in [32]. 

3) Property 3: 

Let Q^ an d Qy denote the marginal probability measures of Qxy * defined on S x and 
S y , respectively. Additionally, let A x and A y denote arbitrary sets in the u-algebras S x and S y , 
respectively. Using (8) and (9), the assumed statistical independence of X and Y under _P X y, and 
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Fig. 8. NASDAQ/NYSE experiment. Left column: The graphical models selected by MTCCA with Gaussian MT-functions for 
A = 0.5, 0.55, 0.58. Middle column: The closest graphs selected by LCCA. Right column: The symmetric difference graphs: 
the red lines indicate edges found by MTCCA and not by LCCA, and vice-versa for the black lines. For these values of A, 
Gaussian MTCCA detects more dependencies than LCCA: the MTCCA graph has more edges than the closest LCCA graph. 



Tonelli's Theorem [33]: 

QST'W - / «<V>< x ,y)= / fR ^L- T ^(x. y) (33) 

A^xy A x xy 



y 

Similarly, it can be shown that (A y ) = f e[^(y^Py] c ^ y anc ^ 

Ay 



(34) 
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Fig. 9. NASDAQ/NYSE experiment. Left column: The graphical models selected by MTCCA with Gaussian MT-functions for 
A = 0.5,0.55,0.58. Middle column: The closest graphs selected by ICCA. Right column: The symmetric difference graphs: 
the red lines indicate edges found by MTCCA and not by ICCA, and vice-versa for the black lines. For these values of A, 
Gaussian MTCCA detects more dependencies than ICCA: the MTCCA graph has more edges than the closest ICCA graph. 



Therefore, since A x and A y are arbitrary, X and Y are statistically independent under the transformed 
probability measure Qxy^. 
4) Property 4: 

According to the definition of ip U)V (x, y) in (9), the strict positivity of u (x) and v (y), and Property 
2, we have that is absolutely continuous w.r.t. P XY with strictly positive Radon-Nikodym 

dQ^ u,v ' (x v) 

derivative rf p*y( x y ) = (x, y). Therefore, by Proposition 4.1.2 in [ ] it is implied that P X y 
is absolutely continuous w.r.t. Qxy^ with a strictly positive Radon-Nikodym derivative given by 

dPxv (x,y) = 1 

dQxY^x.y) ^(x,y)' 
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Fig. 10. NASDAQ/NYSE experiment. Left column: The graphical models selected by MTCCA with Gaussian MT-functions 
for A = 0.5,0.55,0.58. Middle column: The closest graphs selected by KCCA. Right column: The symmetric difference 
graphs: the red lines indicate edges found by MTCCA and not by KCCA, and vice-versa for the black lines. For A = 0.55, 0.58 
the MTCCA graph has more edges than the closest KCCA graph. 



Hence, let A x and A y denote arbitrary sets in the a-algebras S x and S y , respectively. Using (9), 
(35), the assumed statistical independence of X and Y under Qxy \ and Tonelli's Theorem [33]: 



-PXY {A X X Ay 



1 



dQ^ ] (x,y) 



<p U)V (x,y) 

A x X Ay 

E[u(X)«(Y);P„r] f -|t4 V, (x) / -^^(y) 
J u (x) J v{y) 



Similarly, it can be shown that 

Px (A x ) = P XY (A x x y) = E [u (X) v (Y) ; P XY ] E 



(36) 



-Lt;C& v) ] [ 4t^ } (x) (37) 
v (Y) J J u (x) 
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Fig. 1 1 . NASDAQ/NYSE experiment. Distribution of the empirical MT, linear, and informational first-order canonical directions 
on the unit circle. Left to right ordering: First column - MTCCA with exponential MT-functions. Second column - MTCCA 
with Gaussian MT-functions. Third column - LCCA. Fourth column - ICC A. The estimated MT-canonical directions in first 
and second columns are much more concentrated than the linear and informational canonical directions in third and fourth 
columns, respectively. In particular, while linear and informational canonical directions appear to be equally sensitive to the 
daily log-returns and the daily log-volume ratios, MT-canonical directions are much more sensitive to the former as contrasted 
to the latter. 



and 

P Y (Ay) = P XY (X x Ay) = E [u (X) v (Y) ; P XY ] E 
Now, using (1), (9), and (10) we have that 



E 



1 



_u(X 
and similarly, 



E 



1 



n(x; 



E 



E 



L«(x) 

tp U)V (X, Y 



[ ^r-M? v) (y) . (38) 

J «(y) 



E[v (Y);P Y 



[v(Y) 



u(X) 

E[n(X);P x ] 
E[n(X) W (Y);P XY ]' 



E [u (X) v (Y) ; P XY ] 



(39) 



(40) 
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Additionally, by setting A x = X and A y = y in (36), followed by using (1), (39), and (40) it is 
implied that 

E [u (X) v (Y) ; P XY ] = E [u (X) ; P x ] E [v (Y) ; P Y ] . (41) 
Finally, substitution of (41) into (36), (40) into (37), and (39) into (38) yields 

p xy (A x xA y )=[ EKX) : Px] ^> (x) / (y) = (4.) P Y (A,) , 

7 n ( x ) J v{y) 

(42) 

and therefore, since A x and are arbitrary, X and Y are statistically independent under P XY . D 

73. Proof of Theorem 1 : 

Using (18) and (19) one can verify that if the condition in (20) is satisfied, then 

M XY (s,t) = M x ( S )M Y (t) \/(a,t)€U, (43) 

where M x (•) and M Y (•) are the marginal moment generating functions of X and Y, respectively. The 
joint moment generating function reduced to any open region containing the origin, within its region of 
convergence, uniquely determines the joint distribution [29], [30] (this property stems from the analyticity 
of the joint moment generating function about the origin). Hence, by the relation above we have that 
X and Y are statistically independent. Conversely, if X and Y are statistically independent under P XY , 
then by Property 3 of Proposition 1 we have that S x j y ' , ' e) (s, t) = for all (s, t) € 17. □ 



C. Proof of Theorem 2: 

Using (9), (14), and (21) one can easily verify that 



•,(u g ,vg) 



s,t) 



E 


XY T 5 (X)MY)exp(^ + ^ 




;P XY 
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ff (X) MY) exp (^ + *^) 


;P XY 
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Xg (X) h (Y) exp + ; P XY 
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[Y T g (X) h (Y) exp (*; 2 X + tT j) ; P XY ] 


E 2 




r X , t T Y\ . p 

<T 2 7~ 2 J 1 





where 



Additionally, define 



5 (X) = exp 



M^ h) (s, t) = E exp (s T X + t T Y) ; Q 



IXII 



2a 2 



and h (Y) = exp 



2r 2 



XY 



(44) 



(45) 



(46) 
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as the joint moment generating function of X and Y under the transformed probability measure Q 
associated with the MT-functions g (X) and h (Y) in (45). Using (1) and (10) it can be shown that 



(g,h) 



M& h] (s, t) = E [exp (s T X + i T Y) <p gjh (X, Y) ; P : 



XY j 



(47) 



where ip g: h (X, Y) is defined in (9). Therefore, by (44) and (47) we have that 



i("G.«G) 



s.t) = a t 



2 Jl 



(48) 



dsdt T 

Hence, if the condition in (22) is satisfied, then by the properties of the joint moment generating function 
[29], [30], it is implied that X and Y are statistically independent under Qxy^ ■ Thus, since the MT- 
functions g (X) and h (Y) are strictly positive, then by Property 4 of Proposition 1 we conclude that 
X and Y are statistically independent under P XY - Conversely, if X and Y are statistically independent 



under P XY , then by Property 3 of Proposition 1 we have that £ 



(wg,"g) 



s,t) = for all (s,t) G U. □ 



D. Proof of Proposition 2: 

(p) 

Let e\ denote a p-dimensional column vector, where 
delta function. It is easily verified that 



e, 
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Si t k, and <5( v ) denotes the Kronecker 



EE 



(a,t)e 



(?) 



i=i j=i e i ^ x 
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(50) 
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a r £ 
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a r s («.«) 



(s,t)aA/b T E^' 1 ' ) (a,i)b 



maxa 2J X y (s,t)b s.t. a S x a = h S Y b = 1, 

a,b 



(a,t)b 



where the last equality stems from the invariance of , 

V^Sx' " ) (a,t)ay / b^S^ , ' ) (a 1 t)b 
a and b. Therefore, according to (15), (23) and (50), the relation in (24) is verified. 



to normalization of 

□ 



30 



E. Proof of Theorem 3: 

If pi (t,^' v) (s*,t*) , S$ i, '" ) (s*,t*) , S^y } («*, **)) = 0, then by (24) and the positivity of ip (•, •) 

(s*,t*) ,5^ (s*,t*) ,Efe w) (s*,t*)) = 0. 

Therefore, since by (25) (a*, £*) are the maximizers of tp ^S^'^ (s, t) , Sy (s, i) , (s, in over 

V, which is a closed region in IR P x W 1 containing the origin, we have that 

V (s^ (s, t) , E^> (a, t) , *)) = V (a, t) G V. 

Hence, by the definition (23) of ip (-, •, •), (a, t) = on the interior of V, which is an open region 

in R p x containing the origin. Thus, since the MT-functions u(-) and v (•) are chosen according to 
(17) or (21), by Theorems 1 and 2 X and Y must be statistically independent under Pxy- 

Conversely, if X and Y are statistically independent under P X y> then by Property 3 of Proposition 
1 we have that Sxy' (a,i) = for all (a,t) G V, and in particular for (s*,t*). Therefore, by (15), 
Pl ^' v) (s*,t*),^' v \s*,t*),^ v \s*,t*)^ = 0. □ 

F. Proof of Proposition 3: 

It suffices to show that if the conditions in (31) and (32) are satisfied, then £ X y — >■ almost 
surely as N — > oo. Convergence proofs for £ x ' and S Y ' are very similar and therefore omitted. 
According to (28)-(30) 

JV 



where 



lim Sxy' = lim — S^^ n Y^(p uv (X n , Y n ) - lim lim /i^ 1 ", 

JV— >oo JV— >oo iv * — ' JV— >-oo JV— >-oo 

n=l 

N 

N lim ^ X] X n Y„\i (X n ) t> (Y/ 



(51) 



J™ ^7 /L X «Y n ( X n, Y„) = - , (52) 

n=1 Jim £ £ n(X n )^(Y n ) 

lim i X „u (X 
km /x x = , (53) 

JV— >oo , iv 

J im jv £ w(X n )v(Y n ) 

JV^oo n=1 
JV 

lim i E Yn 1 " (X„) f (Y n ) 
lim /z Y = = , (54) 

JV->oo , 7V 

J im iv E w(X n )-y(Y n ) 
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and it is assumed that the denominator 

1 N 

lim - V« (X n ) v (Y„) / a.s. (55) 

TV— >oo iV z — ' 

n=l 

In the following, the limits of the series in the r.h.s. of (52)-(54) are obtained. Additionally, in Remark 
3 below, we show that the assumption in (55) is satisfied. Since (X n , Y n ), n = 1, . . . , N is a sequence 
of i.i.d. samples of (X, Y), then the random matrices X n Y^u (X n ) v (Y n ), n = 1, . . . , N, in the r.h.s. 
of (52), define a sequence of i.i.d. samples of XY T u (X) t> (Y). Moreover, if E P x l < oo, for any 

k = 1, . . . ,p, E [y ; 4 ; P Y ] < oo, for any Z = 1, . . . , q, E [ti 4 (X) ; P x ] < oo, and E [v 4 (Y) ; P v ] < oo, 
then 



E[|X fc y,«(X)t;(Y)|;P XY ] < E (X fc ll) 2 ;P XY E (u (X) v (Y)) 2 ; P 



(56) 



< (E [X 4 ; P x ] E [Yf; P Y ] E [u 4 (X) ; P x ] E [ W 4 (Y) ; P Y ] ) * < oo, 



for any k = 1, . . . ,p and any I = 1, . . . ,q, where the second and third semi-inequalities stem from 
the Holder inequality for random variables [ ]. Therefore, by Khinchine's strong law of large numbers 
(KSLLN) [33] 

iV 

Y-'x .V 



1 - 

lim - Vx n Y^(X n ) W (Y n ) =E[XY T «(X) U (Y);P XY ] a.s. (57) 



n=l 

Similarly, it can be shown that if the conditions in (31) and (32) are satisfied, then by the KSLLN 

N 



lim -j- Vx„,u(X n MY n ) = E [Xu (X) t> (Y) ; P XY ] a.s, (58) 
n=l 

1 N 

lim - VY n n(X n )t;(Y n ) = E [Yn (X) ^ (Y) ; P XY ] a.s, (59) 

/ — ir>n /v * — * 



n=l 

and 

1 N 

lim - Vn(X n ) W (Y„) =E[n(X) W (Y);P XY ] a.s. (60) 
TV— >oo iv z — ' 

n=l 

Remark 3. #y (60) amc? f/ie assumption in (7) the denominator in the r.h.s. of (52)-(54) is non-zero 
almost surely. 

Therefore, since the sequences in the l.h.s. of (52)-(54) are obtained by continuous mappings of the 
elements of the sequences in their r.h.s., then by (57)-(60), and the Mann-Wald Theorem [34] 

1 A T E [XY T u (X) v (Y) ; P XY 1 

lim — > X n Y^0 uv (X n ,Y n ) = 1 r , \ ' \ ' J =E XY r y„ X,Y :P XY a.s. 

N^ooN^ n^u,v\ n, n) E \u (X) V (Y) : P r -1 L ^ ' V ' ' ' J 

n=l 



(61) 



and 
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r r(«.«) E[X^ (X)i;(Y);P XY ] 

lim /x x = = E [TLtp u>v (X, Y ; P XY a.s. (62) 

Af^oo E [u (X) v (Y) ; P XY J 

(uv) E [Y T n (X) v (Y) ; P XY 1 
lim = L ^ ^ > p XYJ = E Y<^ X, Y ; P XY a.s., (63) 

N^oo E [u (X) v (Y) ; PbcyJ 

where the last equalities in (61)-(63) follow from the definition of ip u>v (X, Y) in (9). 

Thus, since the sequence in the l.h.s. of (51) is obtained by continuous mappings of the elements 

of the sequences in its r.h.s., then by (61)-(63), the Mann-Wald Theorem, and (14) it is concluded that 
^ S M a _ s _ as N ^ QO _ n 

G. The empirical MTCCA procedure with the exponential and Gaussian MT-functions 

Given N i.i.d. samples of X and Y, the empirical MTCCA procedure with the exponential and Gaussian 
MT-functions was carried out via the following steps: 
1) Estimate the optimal MT-functions parameters in (17) and (21) according to 

s*,i*) = ar g max ij) (s,t),E Y '" ) (s,t) ,ti^ (s,t)) , (64) 

/ (s,t)ev V / 

where t/j (•, •, •) is defined in (23), and Sx'^ (s,t), E Y ' W ( s >*)> an d ^xy" ( s >*) are tne estimates 
in (26)-(28) of the covariance matrices Ex (s,t), S Y '^ (s, t), and Sxy' (s, t), respectively. The 
maximization in (64) was carried out numerically using gradient ascent over the search region V, 
which was selected as follows: 

a) For the exponential MT-functions, we chose 

V E = Is € R p , t e R q : J XY (s, t) < £>} , 

where D = v2, and J XY — 1 + s T /i x + i T A Y + ^s T R x s + s T R XY i + ^i T R Y t is a 
quadratic empirical approximation of the joint moment generating function M XY (s,t) in (19). 
The vectors /i x and /i Y denote the sample expectations of X and Y, respectively. The matrices 
R x , R Y , and R XY denote sample auto-correlation matrix of X, the sample auto-correlation matrix 
of Y, and their cross-correlation matrix, respectively. Since D = v2 and J XY (s, t) is quadratic 
and takes a unit value at the origin, then Ve defines a closed region in W x containing the 
origin. 

b) For the Gaussian MT-functions, the search region was set to 

V G = {s € R p , t G W : v (X k , 5) < s k < v (X k , 95) , v (Y h 5) < t, < v (^,95) , 

k = l,...,p,l = l,...,q}, 
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where Sk and tj are the k-th and /-th entries of s and t, respectively, and v (X, a) is the empirical 
a-th percentile of the random variable X. One can notice that Vq defines a closed rectangle in 
MP x W 1 . In the considered examples it was verified that Vq contains the origin. We note that in 
case where Vq does not contain the origin, one can always subtract the expectations of X and Y 
and perform MTCCA on X' = X - E [X; P x ] and Y' = Y - E [Y; P Y ]. 
2) Obtain estimates of the MT-canonical correlation coefficients, 



Pk = Pk S X \ S I * 



J XY 



fc = l,...,r, 



and estimates of the MT-canonical directions, 

;%,b t ), *-l,...,r. 
by solving the following GEVD equation 





S XY ( s . t 
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= P 




b 











a 




b 



where p = pk is the A;-th largest generalized eigenvalue of the pencil in (65), and [a T ,b 

T 



(65) 

T1 T _ 



ST 



is its corresponding generalized eigenvector. 



In all considered examples the width parameters a and r of the Gaussian MT-functions (21) were set to 

p q 

a = ^ 2^ ® (Xfc) and r = ± ^ ^ C^)> where <r (X) denotes the empirical standard deviation the random 

k=l 1=1 

variable X. 



H. Testing the statistical significance of the empirical canonical correlation coefficients 

Let X N = {X„}^ =1 and Y N = {Y n }^ =1 denote sequences of N i.i.d. samples of X and Y, respec- 
tively. Additionally, let pk (X*, Y ) denote the empirical fc-th order canonical correlation coefficient 
based on X^ and Y". A bootstrap based procedure for testing the statistical significance of the empirical 
k-th order canonical correlation coefficient is specified below: 

1) Repeat the following procedure for M times (with index m = 1, . . . , M): 

a) Generate a randomly permuted version of the sequence Y^, denoted by Y^. 

b) Compute the statistic 9 m = pk (X^, Y^). 

2) Construct an empirical cumulative distribution function from the sample statistics 9 m , m = 1, . . . , M, 
as 

M 

F e (9) = Pr (G < 9) = — lx>o (x = 9 - 9 m ) , 

m=l 
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where 1 is an indicator random variable on its argument x. 

3) Compute the p-value 

po = 1 - F e (0 O ) , 

where 9q = pk (X. N ,Y N ) is the true detection statistic. 

4) If pq < a, then we have that pk (X. N ,Y N ) is significant at level a, leading to rejection of the 
null-hypothesis of no dependence between X and Y. 

In all considered examples, the number of permutations M and the significance level a were set to 1000 
and 0.01, respectively. 
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